-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match viz dataframe column case to form_data fields for Snowflake, Oracle and Redshift #5487
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5487 +/- ##
==========================================
- Coverage 63.12% 63.08% -0.04%
==========================================
Files 349 349
Lines 22167 22203 +36
Branches 2462 2462
==========================================
+ Hits 13992 14006 +14
- Misses 8161 8183 +22
Partials 14 14
Continue to review full report at Codecov.
|
superset/viz.py
Outdated
@@ -467,6 +468,36 @@ def get_data(self, df): | |||
def json_data(self): | |||
return json.dumps(self.data) | |||
|
|||
def fix_df_column_case(self, df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really belong in viz.py
, I'd much rather have this in db_engine_spec.py
and only execute for these weird database. Maybe it's a method in BaseEngineSpec
that applies conditionally based no a class attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll move it over.
This should now be ready for a new round of review and testing. I've updated the description to reflect the current state of the PR. @mistercrunch Can you take a new look at this? Again, all comments more than welcome. |
LGTM would merge. Holding a bit for confirmation from people tagged here. |
@villebro: I verified your PR using your branch and it worked against Snowflake data source. |
Thanks @mmuru for confirming! |
@mistercrunch This has now been confirmed on both Snowflake and Redshift. I ended up making a small refactor after your review (commit |
It appears to work for me. The UI gets a little weird when I'm not using Superset not installed from pip (this includes source). But I have a series of queries using Redshift and the server doesn't return the |
Thanks for testing @minh5 . Did you do the whole |
FYI there are some unrelated bugs on master at the moment. |
This also fixes #5353 , aka the |
Would merge, please resolve conflict |
@mistercrunch FYI rebased and tested locally to work. |
@mistercrunch Rebased and ready to merge again.
|
Rebased to kickstart CI, now clean bill of health. Ready for merging @mistercrunch |
…acle and Redshift (apache#5487) * Add function to fix dataframe column case * Fix broken handle_nulls method * Add case sensitivity option to dedup * Refactor function definition and call location * Remove added blank line * Move df column rename logit to db_engine_spec * Remove redundant variable * Update comments in db_engine_specs * Tie df adjustment to db_engine_spec class attribute * Fix dedup error * Linting * Check for db_engine_spec attribute prior to adjustment * Rename case sensitivity flag * Linting * Remove function that was moved to db_engine_specs * Get metrics names from utils * Remove double import and rename dedup variable
I am reading the data from redshift and trying to create chart in superset by grouping month wise but could not able to do since there is no option of grouping. What should I do ? |
@adderRavi can you elaborate on what you are trying to do? Sounds like you want to use a month time grain. Also, if you are having trouble with RedShift I would appreciate if you could try #5827 as it fixes all problems I have been able to identify with RedShift and a few other SQL Alchemy engines. |
This might be slightly hacky, but I feel SQL Alchemy gives so much latitude to engines that Superset might need to be slightly more forgiving if a query result has different case compared to the datasource/form metadata. A brief summary of what this PR does:
Adjust dataframe column case, by performing case-sensitive comparison of all colums in central fields in form_data (metrics, groupby) with column names in dataframe:
Examples:
__timestamp
, dataframe:__timestamp
-> Do nothing.__timestamp
, dataframe:__TIMESTAMP
-> Rename dataframe column name to__timestamp
.__timeSTAMP
, dataframe:__TIMESTAMP
-> Rename dataframe column name to__timeSTAMP
.The dataframe is adjusted prior to caching in
BaseViz.get_df_payload()
, with the logic for adjustments located indb_engine_specs
, which is controlled by theconsistent_case_sensitivity
attribute. To minimize risk of collisions, dedup has been changed to be able to perform deduping both in a case-sensitive and case-insensitive manner. Default handling would now be case-sensitive, as before, but for affected engines (Snowflake, Oracle, Redshift) handling will be case-insensitive. Example from test case (note the last Bar which is seen as a duplicate despite different case):Other changes:
handle_nulls()
was implementedfull_table_name
row in db_engine_specFeel free to try it out, and don't feel shy to give feedback/critique. Like stated above, this might be borderline hacky, but seems to work fairly well, and might make it easier to add new engines later on. This fix should also be backwards compatible (except for the handle_nulls()-part, which was recently added), at least
0.26
and0.25
.